2025-01-16
Structural Topic Modeling (STM) (Roberts et al. 2014) is an extension of LDA that incorporates document metadata
It inclusion of covariates in the model, which can help to explain the relationship between topics and document metadata
If you have when metadata critical to your research questions (understanding how topics vary across groups, over time, or in relation to specific attributes)
Difference between sparse and dense matrices
| Phrase | Sentiment Score |
|---|---|
| Let’s go get Italian food | 2.0429166109408983 |
| Let’s go get Chinese food | 1.4094033658140972 |
| Let’s go get Mexican food | 0.38801985560121732 |
| My name is Emily | 2.2286179364745311 |
| My name is Heather | 1.3976291151079159 |
| My name is Yvette | 0.98463802132985556 |
| My name is Shaniqua | -0.47048131775890656 |
Attention: NER does not count as an unsupervised approach!
They are trained on labeled data (human intervention)
But the method is in my Top 3 of ML methods
| Entity Type | Examples |
|---|---|
| Person/Names | John Doe |
| Organizations | United Nations, Google |
| Locations | Paris, Mount Everest |
| Dates/Times | January 15, 2025 |
| Miscellaneous | Product names, monetary values, percentages |
| Entity Type | Entity | n |
|---|---|---|
| PER | Hitler | 25119 |
| PER | Sohn | 24207 |
| PER | Kohl | 23472 |
| PER | Schröder | 19127 |
| PER | Merkel | 18093 |
| PER | Schmidt | 15862 |
| PER | Helmut Kohl | 12800 |
| PER | Strauß | 12375 |
| PER | Brandt | 11627 |
| PER | Adenauer | 11134 |